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Abstract. We study the eigenvalues values of the covariance matrix —M*M 
of a large rectangular matrix M = M„ iP = (Cii)l<i<p-l<?<n whose entries 
are iid random variables of mean zero, variance one, and having finite Cj h 
moment for some sufficiently large constant Co- 

The main result of this paper is a Four Moment Theorem for iid covariance 
matrices (analogous to the Four Moment Theorem for Wigner matrices estab- 
lished by the authors in 1481 (see also [49])). We can use this theorem together 
with existing results to establish universality of local statistics of eigenvalues 
under mild conditions. 

As a byproduct of our arguments, we also extend our previous results on 
random Hermitian matrices to the case in which the entries have finite Cg h 
moment rather than exponential decay. 



1. Introduction 



1.1. The model. The main purpose of this paper is to study the asymptotic local 
eigenvalue statistics of covariance matrices of large random matrices. Let us first 
fix the matrix ensembles that we will be studying. 

Definition 1 (Random covariance matrices). Let n be a large integer parameter 
going off to infinity, and let p = p(n) be another integer parameter such that p < n 
and lim^oop/n = y for some < y < 1. We let M = M n . p = (Cij)i<i<p,i<j<n 
be a random p x n matrix, whose distribution of course is allowed to depend on 
n. We say that the matrix ensemble M obeys condition CI with some exponent 
Co > 2 if the random variables are jointly independent, have mean zero and 
variance 1, and obey the moment condition supj j E\dj\ c ° < C for some constant 
C independent of n,p. We say that the matrix M is iid if the Cij a re identically 
and independently distributed with law independent of n,p. 

Given such a matrix, we form the nx n covariance matrix W — W n ,p '■= -^M* M. 
This matrix has rank p and so the first n — p eigenvalues are trivial; we order the 
(necessarily positive) remaining eigenvalues of these matrices (counting multiplic- 
ity) as 

< Xi(W) < ... < X P {W). 
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We often abbreviate Xi(W) as A;. 



Note that the only distributional hypothesis we require on the entries Qj , besides 
the crucial joint independence hypothesis, are moment conditions. In particular, 
we make no distinction between continuous and discrete distributions here. 

Remark 2. In this paper we will focus primarily on the case y = 1, but several 
of our results extend to other values of y as well. The case p > n can of course be 
deduced from the p < n case after some minor notational changes by transposing 
the matrix M, which does not affect the non-trivial eigenvalues of the covariance 
matrix. One can also easily normalise the variance of the entries to be some other 
quantity a 2 than 1 if one wishes. Observe that the quantities o~i := \fnX X J 2 can 
be interpreted as the non-trivial singular values of the original matrix M, and 
Ai, . . . , A p can also be interpreted as the eigenvalues of the pxp matrix -^MM* . It 
will be convenient to exploit all three of these spectral interpretations of Ai, . . . , X p 
in this paper. Condition CI is analogous to Condition CO for Wigner-type matrices 
in [48] , but with the exponential decay hypothesis relaxed to polynomial decay only. 



The well-known Marchenko-Pastur law governs the bulk distribution of the eigen- 
values Ai , . . . , X p oiW: 

Theorem 3 (Marchenko-Pastur law). Assume Condition CI with Cq > 2, and 
suppose that p/n — > y for some < y < 1. Then for any x > 0, the random 
variables 

-|{1 < i <p: Xi{W) < x}\ 
converge in probability to f$ pMP,y(x) dx, where 

(1) PMP, y [x) := ■ 7 ^^^(b - x)(x - a)l [aM (x) 
and 

(2) a:=(l-Vy) 2 ; 6 = (l + v^) 2 - 

When furthermore M is iid, one can also obtain the case Cq = 2. 



Proof. For the case Cq > 4, see [32], [38]; for the case Co > 2, see [52]; for the 
Co = 2 iid case, see [53 . Further results are known on the rate of convergence: see 



In this paper we are concerned instead with the local eigenvalue statistics. A model 
case is the (complex) Wishart ensemble, in which the Qj are iid variables which are 
complex gaussians with mean zero and variance 1. In this case, the distribution of 
the eigenvalues (Ai, . . . , A„) of IF can be explicitly computed (as a special case of 
the Laguerre unitary ensemble) . For instance, when p = n, the joint distribution is 
given by the density function 

n 

(3) p n (Xi, ■ ■ ■ , A„) = c(n) Yl | Ai - Xj\ 2 exp(-n^ A. t ) 

l<i<j<n i—1 
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for some explicit normalization constant c(n) whose exact value is not important 
for this discussion. 

Very similarly to the GUE case, one can use this explicit formula to directly com- 
pute several local statistics, including the distribution of the largest and smallest 
eigenvalues [7], the correlation functions [36 a etc. Also in similarity to the GUE 
case, it is widely conjectured that these statistics hold for a much larger class of 
random matrices. For some earlier results in this direction, we refer to [151 IT71 151 ITS] 
and the references therein. 

The goal of this paper is to establish a Four Moment theorem for random covari- 
ance matrices, as an analogue of a recent result in [48]. This theorem asserts that 
all local statistics of the eigenvalues of W n is determined by the first four moments 
of the entries. 



1.2. The Four Moment Theorem. We first need some definitions. 
Definition 4 (Frequent events). [48] Let E be an event depending on n. 

• E holds asymptotically almost surely if ~P(E) = 1 — o(l). 

• E holds with high probability if P(E) > 1 — 0(n~ c ) for some constant c > 
(independent of n). 

• E holds with overwhelming probability if P(E) > 1 — Oc{n~ C ) for every 
constant C > (or equivalently, that P(-E') > 1 — exp(— w(log n))). 

• E holds almost surely if P(E) = 1. 

Definition 5 (Matching). We say that two complex random variables C, C' match 
to order k for some integer k > 1 if one has ERe(()"Tm(C) z = ERe(C') m Im(C')' for 
all m, I > with m + I < k. 

Our main result is 

Theorem 6 (Four Moment Theorem) . For sufficiently small cq > and sufficiently 
large Cq > (Co = 10 would suffice) the following holds for every < e < 1 and 
k > 1. Let M — (Cij)i<i<p,i<j<n and M' = {Cij)i<i<p.i<j<n be matrix ensembles 
obeying condition CI with the the indicated constant Cq, and assume that for each 
i,j that £jj and • match to order 4. Let W,W be the associated covariance 
matrices. Assume also that p/n — > y for some < y < 1. 

Let G : R fe — > R be a smooth function obeying the derivative bounds 
(4) |V J G(z)| < n C0 

for allO < j < 5 and x G R fc . 

Then for any ep < i\ < ii ■ ■ ■ < ik < (l — s)p, and for n sufficiently large depending 
on e, k, cq we have 



(5) |E(G(nA n (W), n\ lk (W))) - E(G(nA 4l (W), n\ tk {W')))\ < n~ 
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If dj an d dj on ly match to order 3 rather than 4, the conclusion ^ still holds 
provided that one strengthens {4J to 



for all < j < 5 and x £ M. k and any c\ > 0, provided that cq is sufficiently small 
depending on C\. 



This is an analogue of [481 Theorem 15] for covariance matrices, with the main 

difference being that the exponential decay condition from [3HI Theorem 15] has 
been weakened to the high moment condition in CI. This is achieved by an "ex- 
ponential decay removing trick" that relies on using a truncated version of the four 
moment theorem to extend the range of validity of a key "gap condition" that is 
used in the proof of the above theorem. The same trick also allows one to obtain a 
similar strengthening of the main results of [HI HH] > thus relaxing the exponential 
decay hypotheses in those results to high moment conditions. The value Co = 10 4 
is ad hoc, and we make no attempt to optimize this constant. 

Remark 7. The reason that we restrict the eigenvalues to the bulk of the spectrum 
(ep < i < (1 — s)p) is to guarantee that the density function PMP,y is bounded away 
from zero. In view of the results in ]49], we expect that the result extends to the 
edge of the spectrum as well. In particular, in view of the results in [3], it is likely 
that the hard edge asymptotics of Forrester [19] can be extended to a wider class of 
ensembles. We will pursue this issue elsewhere. 

1.3. Applications. One can apply Theorem [5] in a similar way as its counterpart 
[481 Theorem 15] in order to obtain universality results for large classes of ran- 
dom matrices. In many cases, one can combine this theorem with existing partial 
results for special ensembles to remove some of the moment assumptions. Let us 
demonstrate this through an example concerning the universality of the sine kernel. 

Using the explicit formula ([3]), Nagao and Wadati [35] established the following 
result for the complex Wishart ensemble. 

Theorem 8 (Sine kernel for Wishart ensemble). [36 Let k > 1 be an integer, let 
f : K fc — > C be a continuous function with compact support and symmetric with 
respect to permutations, and let < u < 4; we assume all these quantities are 
independent of n. Assume thaB p = n + 0(1) (thus y = 1), and that W is given by 
the complex Wishart ensemble. Let \\, . . . , X p be the non-trivial eigenvalues of W . 
Then the quantity 

(6) E ^ f(np M pAu)\ tl ,...,np M p,i{u)\ lk ) 




|V J G(x)| < n 





i<n,...,jfc<P 



converges as n — » oo to 




/(ii, . . . , det(i4'(ii, ii))i<,,j<fc dt 1 ...dt k 



where K(x, y) : 



sin(7r(a:— y)) 
n{x—y) 



is the sine kernel. 



See Section [XT] for the asymptotic notation we will be using. 
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Remark 9. The results in [36 allowed / to be bounded measurable rather than 
continuous, but when we consider discrete ensembles later, it will be important to 
keep / continuous. 

Returning to the bulk, the following extension was established by Ben Arous and 
Peche [3J, as a variant of Johansson's result [27J for random hermitian matrices. 
We say that a complex random variable C of mean zero and variance one is Gauss 
divisible if £ has the same distribution as C = (1 -i) 1/2 C +i 1/2 C" for some < t < 1 
and some independent random variables £'jC" of mean zero and variance 1, with 
(," distributed according to the complex gaussian. 

Theorem 10 (Sine kernel for Gaussian divisible ensemble). [3J Theorem^ (which 
is for the Wishart ensemble and for p = n + 0(l)) can be extended to the case when 
p = n + 0(rt 43 / 48 ) (so y is still 1 ), and when M is an iid matrix obeying condition 
CI with Co = 2, and with the Qj gauss divisible. 

Using Theorem[6jand Theorem HOI (in exactly the same way we used [48] Theorem 
15] and Johansson's theorem [27] to establish [48] Theorem 11]), we obtain the 
following result: 

Theorem 11 (Sine kernel for more general ensembles). Theorem^ can be extended 
to the case when p = n + 0(n 43 / 48 ) (so y is still \), and when M is an iid matrix 
obeying condition CI with Co sufficiently large (Co — 10 4 would suffice), and with 
Qj have support on at least three points. 

Proof. (Sketch) It was shown in (48] Corollary 30] that if the real and imaginary 
parts of a complex random variable £ were independent with mean zero and variance 
one, and both were supported on at least three points, then £ matched to order 
4 with a gauss divisible random variable £' with finite Co moment (indeed, if one 
inspects the convexity argument used to solve the moment problem in |48[ Lemma 
28] , the gauss divisible random variable could be taken to be the sum of a gaussian 
variable and a discrete variable, and in particular is thus exponentially decaying). 
If one lets M' be the iid matrix whose coefficients have entries then Theorem [T01 
asserts that the conclusions of Theorem |S] hold for M' . Using Theorem [0J exactly 
as in the proof of [481 Theorem 11] (and approximating / uniformly by smooth 
functions), we conclude that the conclusions of Theorem [8] hold for M also. □ 

The arguments in this paper will be a non-symmetric version of those in |48] . The 
arguments in [48] started with analyzing the stability of the eigenvalue equation 
Mvi = \{Vi where M is a random Hermitian matrix and A, is the i th eigenvalue 
with eigenvector v. For the situation considered in this paper, it is tempting to 
similarly analyze the eigenvalue equation Wvi — \vi for the covariance matrix W . 
However, this does not work, since the covariance matrix W, while random, does 
not have independent entries. The new idea here is to work with a system of two 
equations 



(7) 



MUi — CTiVi 
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and 



(8) 



M*Vi = (Tjli. 



7) 



where ut and v% are the left and right singular vectors of M. This leads to a number 
of technical issues that need to be addressed through the paper. 

One can combine the singular value equations ([7|). ([8]) into a single eigenvalue 
equation 



Thus one can view the singular values of an iid matrix as being essentially given 
by the eigenvalues of a slightly larger Hermitian matrix which is of Wigner type 
except that the entries have been zeroed out on two diagonal blocks. We will take 
advantage of thus augmented perspective in some parts of the paper (particularly 
when we wish to import results from |48) as black boxes), but in other parts it 
will in fact be more convenient to work with M directly. In particular, the fact 
that many of the entries in ([§]) are zero (and in particular, have zero mean and 
variance) seems to make it difficult to directly apply parts of the arguments from 
[IB] (particularly those that are probabilistic in natureQ rather than deterministic) 
directly to the augmented matrix, and will instead work with M directly in these 
cases. Nevertheless one can view this connection as a heuristic explanation as to 
why some (but not all) of the machinery in the Hermitian eigenvalue problem can 
be transferred to the non- Hermitian singular value problem. 

1.4. Extensions. In a very recent work, Erdos, Schlein, Yau and Yin [14] ex- 
tendecjl Theorem [10] to a large class of matrices, assuming that the distribution of 
the entries Qj is sufficiently smooth and obeys a log-Sobolev inequality. While their 
results do not apply for entries with discrete distributions, it allows one to extend 
Theorem [10] to the case when t is a negative power of n. Given this, one can use 



A typical instance of a probabilistic argument that encounters difficulty when there are many 
zero entries arises when one wants to estimate the distance dist(X, V) between a random vector 
X = (§i , . . . , £ n ) (which one should think of as something like a row of M) and a fixed subspace V, 
If all the entries of X are iid with mean zero and constant variance, then an easy second moment 
computation allows one to control Edist(X, V) 2 exactly in terms of the codimension of V; in 
particular, no knowledge of the orientation of V is required. One also obtains reasonable upper 
and lower bounds on this quantity if the variance is not constant, but is also bounded above and 
below. However, if many of the entries of X have zero variance (i.e. they vanish), then one has 
difficulty lower bounding Edist(Y, V) 2 because one has to somehow exclude the possibility that 
the normal vectors to V have almost all of their I 2 mass supported on those zero variance entries. 
We do not know how to address this problem in general. Note added in proof: Several months 
after the submission of this paper, Erdos, Yau, and Yin |16l 1 17] were able to obtain universality 
results for some classes of generalized Wigner matrices (such as band matrices) in which some 
entries are permitted to have zero variance. However, one of their key assumptions is that the 
matrix of (normalised) variances has a simple eigenvalue at 1, and this assumption does not hold 
for the augmented matrix ((9)1 . 

^Even more recently, a similar result was also established by Peche[39], 




where M is the augmented matrix 



(9) 
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the argument in |15j to remove the requirement that the real and imaginary parts 
of Cy be supported on at least three points. 

We can also have the following analogue of Theorem 2], 

Theorem 12 (Universality of averaged correlation function). Fix e > and u 
such that < it — e < u + e < 4. Let k > 1 and let f : M. k — > K be a continuous, 
compactly supported function, and let W = W n ,n be a random covariance matrix, 
with n assumed large depending on u,s, k. Then the quantity 
(10) 

t r 7 f^-^h — w^ fc) (" ,+ t» ^+ °vo 

2e Ju-e Jn fc |pa;p,i(h )] V npMP,i(. u ) npMP.ilu')/ 

converges as n — > oo to 

/(ai,. . . , afc) det(iT(ai, Qtj))i,j=i dai... da k , 

where K{x, y) is the Dyson sine kernel 

(ID K( X ,y) ;= M^- W )) > 

7r(a; - y) 

and the k-point correlation function the unique symmetric prob- 

ability distribution such that 

[ /(ai,...,a fc )pi fc) (ai,...,a fc ) := ^ /(Ai,...,A„) 

®* UJ l<u<...<i fc <„ 

/or all test functions f. (If W is a discrete ensemble, one has to interpret p„ as 
a distribution or a probability measure rather than as a function.) 



da 



The details are essentially the same as in [TS], and are omitted. 

Remark 13. The four moment theorem controls the distribution of individual 
eigenvalues (or singular values) Ai , but as indicated above, this control can then be 
used to obtain control of correlation expressions such as (fTUf . The local relaxation 
flow methods developed in [5]- [14] . by contrast, are focused on individual energy 
levels u rather than individual eigenvalues. As such, they provide an alternate ap- 
proach to controlling correlation expressions such as (|10[) . but we do not know how 
to convert such information back to control on individual eigenvalues or singular 
values in general, because the standard deviation of each eigenvalue can exceed (by 
a logarithmic factor, see [25]) the scale of the mean eigenvalue spacing, which is the 
scale at which the correlation estimates operate at. 



1.5. Acknowledgments. We thank Horng-Tzer Yau for references, and the anony- 
mous referee for helpful comments. 



2. The gap property and the exponential decay removing trick 



The following property plays an important role in |48j. 



s 
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Definition 14 (Gap property). Let M be a matrix ensemble obeying condition 
CI. We say that M obeys the gap property if for every e, c > (independent of 
n), and for every ep < i < (1 — e)p, one has lAi+^W^) — Ai(VF)| > n _1_c with high 
probability. (The implied constants in this statement can depend of course on s 
and c.) 

As an analogue of (48j Theorem 19], we prove the following theorem, using the 
same method with some modifications. 

Theorem 15 (Gap theorem). Let M = (Cij)i<i<p.i<j< n obey condition CI for 
some Co, and suppose that the coefficients Cij are exponentially decaying in the 
sense that P(|Cij| ^ t ) < exp(— t) for all t > C for all i,j and some constants 
C,C > 0. Then M obeys the gap property. 

Next, we have the following analogue of (4Sj Theorem 15]. 

Theorem 16 (Four Moment Theorem with Gap assumption). For sufficiently 
small Co > and sufficiently large Co > (Co — 10 4 would suffice) the fol- 
lowing holds for every < e < 1 and k > 1. Let M = (Cij)i<i<p,i<j<n and 
M' = (dj)i<i<p.i<j<n be matrix ensembles obeying condition CI with the indi- 
cated constant Co, and assume that for each i,j that Qj and Q'- match to order 
4. Let W,W be the associated covariance matrices. Assume also that M and M' 
obeys the gap property, and that p/n — > y for some < y < 1. 

Let G : M. k — > M be a smooth function obeying the derivative bounds 

(12) \V J G(x)\ < n C0 
for allO < j < 5 and x G R fe . 

Then for any ep < i\ < ii ■ ■ ■ < ik < (l — s)p, and for n sufficiently large depending 
on e, k, Co we have 

(13) |E(G(nA 4l (W),..., n\ tk (W))) - E(G(nX n (W), n\ tk (W')))\ < . 

If and £y only match to order 3 rather than 4, the conclusion f|13[) still holds 
provided that one strengthens (fT2")l to 

|V J G(x)| < n-^ 1 

for all < j < 5 and x £ K fe and any c\ > 0, provided that Co is sufficiently small 
depending on c\ . 

This theorem is weaker than Theorem [Bl as we assume the gap property. Besides 
the fact that we consider singular values here instead of eigenvalues, the main 
difference between this result and [48j Theorem 15] is that in the latter we assume 
exponential decay rather than the gap property. However, this difference is only a 
formality, since in the proof of |48[ Theorem 15], the only place we used exponential 
decay is to prove the gap property (via [HI Theorem 19]). 
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The core of the proof of Theorem[T6]is a truncated four moment theorem (Theorem 
I31|) . which allows us to insert information such as the gap property into the test 
function G. 

By combining Theorem [16] with Theorem 1151 we obtain Theorem [6] in the case 
when the coefficients £y are exponentially decaying. To remove the exponential 
decay hypothesis, we will apply the truncated four moment theorem (Theorem 
131 p a second time, together with a moment matching argument (Lemma I33[) to 
eliminate this hypothesis from Theorcm ll5l 

Theorem 17 (Gap theorem). Assume that M — (Cij)i<i<p,i<j<n satisfies condi- 
tion CI with Cq sufficiently large. Then M obeys the gap property. 

Theorem [6] follows directly from Theorems [16] and [17] 

The rest of the paper is organized as follows. The next three sections are devoted 
to technical lemmas. The proofs of Theorems [TrU and [T71 are presented in Section 
[6] assuming Theorems [31] and [15] The proofs of these latter two theorems are 
presented in Sections [7] and [8] respectively. 



3. The main technical lemmas 

Important note. The arguments in this paper are very similar to, and draw heavily 
from, the previous paper [48] of the authors. We recommend therefore that the 
reader be familiar with that paper first, before reading the current one. 

In the proof of the Four Moment Theorem (as well as the Gap Theorem) for 
n x n Wigner matrices in [48) . a crucial ingredient was a variant of the Derealiza- 
tion Theorem of Erdds, Schlein, and Yau [9j [TQl |TT] . This result asserts (assuming 
uniformly exponentially decaying distribution for the coefficients) that with over- 
whelming probability, all the unit eigenvectors of the Wigner matrix have coeffi- 
cients 0(n^ 1/ ' 2+0 ( 1 ') (thus the ll £ 2 energy" of the eigenvector is spread out more 
or less uniformly amongst the n coefficients). When one just assumes uniformly 
bounded Co moment rather than uniform exponential decay, the bound becomes 
0(n _1 / 2+0 ( 1 / c ' )) instead (where the implied constant in the exponent is uniform 
in Co, of course). 

Similarly, to prove the Four Moment and Gap Theorems in this paper, we will need 
a Dclocalization theorem for the singular vectors of the matrix M. We define a 
right singular vector u, (resp. left singular vector vf) with singular value ct^(M) = 
yfn\i(W) 1/2 to be an eigenvector of W = ±M*M (resp. W = -MM*) with 
eigenvalue A^. In the generic case when the singular values are simple (i.e. < 
a i < . . . < fTp), we observe from the singular value decomposition that one can find 
orthonormal bases u\, . . . , u p £ C" and Vx, ■ ■ ■ , Vp € C p for the corange kei(M) 1 - of 
M and of C p respectively, such that 



Mui = o~iVi 
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and 



M*Vi = a l u l . 



Furthermore, in the generic case the unit singular vectors Ui,Vi are determined up 
to multiplication by a complex phase e . 

We will establish the following Erdos-Schlein-Yau type delocalization theorem 
(analogous to [3E1 Proposition 62]), which is an essential ingredient to Theorems 
[TBI [131 and is also of some independent interest: 

Theorem 18 (Delocalization theorem). Suppose that p/n — > y for some < y < 1, 
and let M obey condition CI for some Co > 2. Suppose further that that \Qj \ < K 
almost surely for some K > 1 (which can depend on n) and all and that the 
probability distribution of M is continuous. Let e > be independent of n. Then 
with overwhelming probability, all the unit left and right singular vectors of M with 
eigenvalue Aj in the interval [a+e, b—e] (with a, b defined in ([2"]) ) have all coefficients 
uniformly of size O^Kn" 1 / 2 log 10 n). 

The factors K log 10 n can probably be improved slightly, but anything which is 
polynomial in K and logn will suffice for our purposes. Observe that if M obeys 
condition CI, then each event \Qj \ < K with K :— n w / Co (say) occurs with prob- 
ability 1 — 0(n~ 10 ). Thus in practice, we will be able to apply the above theorem 
with K = n w / Ca without difficulty. The continuity hypothesis is a technical one, 
imposed so that the singular values are almost surely simple, but in practice we will 
be able to eliminate this hypothesis by a limiting argument (as none of the bounds 
will depend on any quantitative measure of this continuity). 

As with other proofs of delocalization theorems in the literature, Theorem [18] is 
in turn deduced from the following eigenvalue concentration bound (analogous to 
[481 Proposition 60]): 

Theorem 19 (Eigenvalue concentration theorem). Let the hypotheses be as in 
Theorem \18l and let S > be independent of n. Then for any interval L c [a + e, b — 
e] of length \L\ > K 2 log 20 n/n, one has with overwhelming probability (uniformly 
in I) that 



is the number of eigenvalues in I . 

We remark that a very similar result (with slightly different hypotheses on the 
parameters and on the underlying random variable distributions) was recently es- 
tablished in [HI Corollary 7.2]. 

We isolate one particular consequence of Theorem \T§\ (also established in [2"5]): 

Corollary 20 (Concentration of the bulk). Let the hypotheses be as in Theorem \18\ 
Then there exists e' > independent of n such that with overwhelming probability, 
one has a + e' < Xi(W) < b — e' for all ep < i < (1 — e)p. 




where 



(14) 



Nj := {1 < i < p : \i{W) G /} 
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Proof. From Theorem 1191 we see with overwhelming probability that the number 
of eigenvalues in [a + e',b — e'] is at least (1 — e)p, if e' is sufficiently small depending 
on e. The claim follows. □ 



3.1. Notation. Throughout this paper, n will be an asymptotic parameter going 
to infinity. Some quantities (e.g. e, y and Co) will remain other independent of n, 
while other quantities (e.g. p, or the matrix M) will depend on n. All statements 
here are understood to hold only in the asymptotic regime when n is sufficiently 
large depending on all quantities that are independent of n. We write X — 0(Y), 
Y = to(\X\), \X\ « Y, or F » \X\ if one has \X\ < CY for all sufficiently large 
n and some C independent of n. (Note however that C is allowed to depend on 
other quantities independent of n, such as e and y, unless otherwise stated.) We 
write X = o(Y) if \X\ < c(n)Y where c(n) — > as n — > oo. We write X — 6(F) 
or X ^ Y if X -C Y -C X, thus for instance if p/n — > y for some < y < 1 then 
p=Q(n). 

We write y— 1 for the complex imaginary unit, in order to free up the letter i to 
denote an integer (usually between 1 and n). 

We write \\X\\ for the length of a vector X, \\A\\ = \\A\\ op for the operator norm of 
a matrix A, and \\A\\p = ti^AA*) 1 / 2 for the Frobenius (or Hilbert-Schmidt) norm. 



4. Basic tools 

4.1. Tools from linear algebra. In this section we recall some basic identities 
and inequalities from linear algebra which will be used in this paper. 

We begin with the Cauchy interlacing law and the Weyl inequalities: 
Lemma 21 (Cauchy interlacing law). Let 1 < p < n. 

(i) If A„ is an n x n Hermitian matrix, and A n ^\ is an n — 1 xn - 1 minor, 
then Xi(A n ) < Xi(A n -i) < \i + i(A n ) for all 1 < i < n. 

(ii) If M nyP is apxn matrix, and M n ^ p -\ is an p—lxn minor, then ai{M n>p ) < 
&i{M ntP -x) < o-j + i(M„,p) for all 1 < i < p. 

(hi) If p < n, if M n>v is a p x n matrix, and Af„_i. p is a p x n — 1 minor, 
then (T,_i(Af„ iP ) < <7i(Af n _i iP ) < <7i(-M„ iP ) for all 1 < i < p, with the 
understanding that (To(M ntP ) = 0. (For p = n, one can of course use the 
transpose of (ii) instead.) 



Proof. Claim (i) follows from the minimax formula 

Xi(A n ) = inf sup v* A n v 

V:dim(V)=i „ e y:||u||=l 
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where V ranges over «-dimensional subspaces in C™. Similarly, (ii) and (iii) follow 
from the minimax formula 

cr i (M„, p )= inf sup ||M n , p u||. 

V:dim(V)=i+n-p ve v.\\v\\ = l 

□ 

Lemma 22 (Weyl inequality). Let 1 < p < n. 

• If A, B are n x n Hermitian matrices, then ||A,(A) — Xi(B)\ < \\A — B\\ op 



for all 1 < i < n. 
If M,N a 
1 < i < p. 



• If M,N are px n matrices, then \\a l {M) — 0i(iV)| < \\M - N\\ op for all 



Proof. This follows from the same minimax formulae used to establish Lemma 
[22 □ 

Remark 23. One can also deduce the singular value versions of Lemmas [5TJ 1221 
from their Hermitian counterparts by using the augmented matrices ([9]). We omit 
the details. 



We have the following elementary formula for a component of an eigenvector of a 
Hermitian matrix, in terms of the eigenvalues and eigenvectors of a minor: 

Lemma 24 (Formula for coordinate of an eigenvector). [9] Let 

, _ ( A„_i X 
An -\X* a 

be a n x n Hermitian matrix for some a £ M. and X 6 C n_1 , and let ( " I be a unit 

w 

eigenvector of A n with eigenvalue Xi(A n ), where x G C and v G C" . Suppose 
that none of the eigenvalues of A n -\ are egual to Xi(A n ). Then 

\x\ 2 - 1 



1 + E"Ji ^j(An-t) - K{A n ))-^\u 3 {A n ^YX\^ 

where Ui(A n _i), . . . , u„_i(A„_ 1 ) 6 C™ -1 is an orthonormal eigenbasis correspond- 
ing to the eigenvalues Xi{A n ^i), . . . , X n ^i{A n _i) of A n _i. 

Proof. See e.g. [151 Lemma 41], □ 

This implies an analogous formula for singular vectors: 
Corollary 25 (Formula for coordinate of a singular vector). Let p,n > 1, and let 

M p>n = (M p , n _i X) 

be a px n matrix for some X £ C p , and let (^j be a right unit singular vector of 
M Pt n with singular value o~i{M Ptn ), where x G C and u G C n_1 . Suppose that none 
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of the singular values of M PlU -i are equal to o~i(M Pin ). Then 



\x\ 2 ' 



where v\{M p ^ n -\), . . . , i> m in(p,n-i) (-^p.n-i) G C p is an orthonormal system of left 
singular vectors corresponding to the non-trivial singular values of M p _ n -\. 

In a similar vein, if 



M Pyn 



for some Y £ C n , and (v y) is a left unit singular vector of M Pin with singular 
value o~i(M Ptn ), where y € C and v € C p_1 , and none of the singular values of 
M p -i_ n are equal to o~i{M Ptn ), then 

1 



2 



\V\ = 

where ■ui(M p _i j „), . . . , u m i n (p-i,n)(-^p-i,n) € C" is an orthonormal system of right 
singular vectors corresponding to the non-trivial singular values of M p _i }Tl . 

Proof. We just prove the first claim, as the second is proven analogously (or by 
taking adjoints) . Observe that [ ] is a unit eigenvector of the matrix 



x 

M* n _ 1 M Pin _i M* n _iX 

fp,n— 1 

with eigenvalue o~i(M P}n ) 2 . Applying Lemma [Ml we obtain 
, „ 1 



1 + E"=i (^(M'^Mp.n.x) - (Ti (M p ,„ ) 2 )~ 2 \uj (M* n _ 1 M Pi „_ i ) * M* n _ 1 X\'< 

But u j (M* n _ 1 M p . n _ 1 )*M* n _ 1 = t7j(M P)n _i)i; i (M Pi „_i)* for the min(p,n- 1) 
non-trivial singular values (possibly after relabeling the j), and vanishes for trivial 
ones, and X 3 (M* „_ 1 Ai p ,„_i) = aj{M p ^ n _\) 2 , so the claim follows. □ 

The Stieltjes transform s(z) of a Hermitian matrix W is defined for complex z by 
the formula 



It has the following alternate representation (see e.g. [U Chapter 11]): 

Lemma 26. Let W — {Cij)i<i,j<n be a Hermitian matrix, and let z be a complex 
number not in the spectrum ofW. Then we have 



1 " 



1 



n J Ckk - z - a>t(Wk - zl) 1 a k 



where Wk is the n-lxn-1 matrix with the k th row and column removed, and 
ak G C n_1 is the fc th column of W with the fc th entry removed. 
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Proof. By Schur's complement, 7 tttt, n=r — is the k th diagonal entry of 

L <,kk z a k\**k Zl) a k 

(W — zl)^ 1 . Taking traces, one obtains the claim. □ 

4.2. Tools from probability theory. We will rely frequently on the following 
concentration of measure result for projections of random vectors: 

Lemma 27 (Distance between a random vector and a subspace). Let X — (£1, . . . , £ n ) G 
C™ be a random vector whose entries are independent with mean zero, variance 
I, and are bounded in magnitude by K almost surely for some K, where K > 
10(E|£| 4 + 1). Let H be a subspace of dimension d and tth the orthogonal projec- 
tion onto H . Then 

P(||K(X)|| - Vd\ > t) < lOexpC-^). 
In particular, one has 

\\tth(X)\\ = Vd + 0(K\ogn) 
with overwhelming probability. 

Proof. See [HI Lemma 43] ; the proof is a short application of Talagrand's inequality 
[3D]. □ 

5. Delocalization 

The purpose of this section is to establish Theorem [T5] and Theorem [121 The 
material here is closely analogous to [3H1 Sections 5.2, 5.3], as well as that of the 
original results in [9] [TUJ [IT] and can be read independently of the other sections of 
the paper. The recent paper [14] also contains arguments and results closely related 
to those in this section. 

5.1. Deduction of Theorem 1181 from Theorem 1191 We begin by showing how 
Theorem [18] follows from Theorem [19l We shall just establish the claim for the 
right singular vectors Ui, as the claim for the left singular vectors is similar. We fix 
e and allow all implied constants to depend on e and y. We can also assume that 
K 2 log 20 n = o(n) as the claim is trivial otherwise. 

As M is continuous, we see that the non-trivial singular values are almost surely 
simple and positive, so that the singular vectors Ui are well defined up to unit 
phases. Fix 1 < i < p; it suffices by the union bound and symmetry to show that 
the event that Xi falls outside [a + e, b — e] or that the n th coordinate x of ui is 
0(Kn~ x l 2 log n) holds with (uniformly) overwhelming probability. 

Applying Corollary |251 it suffices to show that with uniformly overwhelming prob- 
ability, either Xi £ [a + e, b — e], or 

min(p,n-l) 2 

(15) £ («(M Z«(M v M M P ,n-,yx\ 2 >> " , 

{ a j[ M p,n-l) - <Ti\ M p,n) ) K log n 



UNIVERSALITY FOR COVARIANCE MATRICES 15 

where M = (M P)Tl _x X). But if A, e [a + E,b- s], then bj0 Theorem fl9| one 
can find (with uniformly overwhelming probability) a set J C {1, . . . , min(p, n— 1)} 
with |J| > # 2 log 20 n such that Aj(M p , n _i) = Aj(M p> „) + 0(i^ 2 log 20 n/n) for all 
j e J; since A* = -of, we conclude that <z,-(M Pi „_i) 2 = <Ti(M p ,„) 2 + 0(X 2 log 20 n). 
In particular, aj(M Ptn -i) = 6(y / n). By Pythagoras' theorem, the left-hand side of 
([15]) is then bounded from below by 

(i^ 2 log 20 n) 2 

where iJ C C p is the span of the Uj(Af p n _i) for j G J. But from Lemma |2"T1 (and 
the fact that X is independent of M Pj „_i), one has 

H^Xf > if 2 log 20 ™ 

with uniformly overwhelming probability, and the claim follows. 

It thus remains to establish Theorem [T9l 



5.2. A crude upper bound. Let the hypotheses be as in Theorem [T9l We first 
establish a crude upper bound, which illustrates the techniques used to prove The- 
orem [191 and also plays an important direct role in that proof: 

Proposition 28 (Eigenvalue upper bound). Let the hypotheses be as in Theorem 
\18[ then for any interval Id [a + e, b — e] of length \I\ > .ST log 2 n/n, one has with 
overwhelming probability (uniformly in I) that 

\Nj\ « n\I\ 

where |/| denotes the length of I, and Ni was defined in (|14j). 



To prove this proposition, we suppose for contradiction that 

(16) |JVj| > Cn\I\ 

for some large constant C to be chosen later. We will show that for C large enough, 
this leads to a contradiction with overwhelming probability. 

We follow the standard approach (see e.g. [2]) of controlling the eigenvalue count- 
ing function Nj via the Stieltjes transform 

s{z) : =;E A . (M0 _ Z - 

Fix /. If x is the midpoint of /, n := |/|/2, and z :— x + y/—lr), we see that 

Ims(z) > 

VP 

(recall that p — 0(n)) so from (fT6|) one has 

(17) Im(s(z)) > C. 

%n the case p = n, one would have to replace M Pjn _i by its transpose to return to the regime 
p < n. 
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Applying Lemma [26l with W replaced by the p x p matrix W := ^MM* (which 
only has the non-trivial eigenvalues), we see that 



(18) *(^E 



p k~i ^ kk ~ z ~ a *k( Wk _ zI ~> lflfc ' 

where ^ kk is the kk entry of W , Wk is the matrix with the k th row 

and column of W removed, and a k G C p_1 is the fc th column of W with the k th 
entry removed. 

Using the crude bound |Im~| < n^nj\ an( l p7| - one concludes 



1 p 1 

p£j \7 ] + lma* k {W k -zI)- 1 a k \ 



By the pigeonhole principle, there exists 1 < fc < p such that 

(19) \ V + lma* k (W k -zI)-^a k \ ^ °' 

The fact that k varies will cost us a factor of p in our probability estimates, but this 
will not be of concern since all of our claims will hold with overwhelming probability. 

Fix k. Note that 

(20) a k = -M k X k 

n 

and 

W k = -M k M* k 
n 

where X k G C" is the (adjoint of the) k th row of M, and is the p — 1 x n 
matrix formed by removing that row. Thus, if we let v\ (M k ), . . . , v p -i(M k ) G C p_1 
and ui(Mfc), . . . , u p -i(Mf.) G C n be coupled orthonormal systems of left and right 
singular vectors of M k , and let Xj(W k ) — ^aj(M k ) 2 for 1 < j < p — 1 be the 
associated eigenvectors, one has 



(21) ai(W k ~zI)- 1 a k = J2 

i- 

and thus 



Kv 3 (M k )\< 
i Ai(W*)- 



im4(^-z/rv>/E |a ^ (Mfe)l 



We conclude that 



^ + \\ j {W k )-xf 



^ 7? 2 + | A . (Wfe) _ :E |2« C7? - 

The expression a* k Vj{M k ) can be rewritten much more favorably using (|20[) as 
(22) (Af fc ) = aj ^ X^ Uj (M fc ). 

The advantage of this latter formulation is that the random variables and 
Uj{M k ) are independent (for fixed fc). 
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Next, note that from (fT6|) and the Cauchy interlacing law (Lemma l2~Tj) one can 
find an interval J C {1, . . . ,p — 1} of length 

(23) |J| > Crjn 

such that Xj(Wk) €= /. We conclude that 

Since A^VFfc) € /, one has tjj(Mk) = Q(^/n), and thus 

J2\X* kU] (M k )\ 2 «^. 

je.J 

The left-hand side can be rewritten using Pythagoras' theorem as ||7r#Xfc|| 2 , where 
H is the span of the eigenvectors Uj(M k ) for j £ J. But from Lemma |2"T1 and (|2"3")l 
we see that this quantity is 3> r\n with overwhelming probability, giving the desired 
contradiction with overwhelming property (even after taking the union bound in 
k). This concludes the proof of Proposition [28l 



5.3. Reduction to a Stieltjes transform bound. We now begin the proof of 
Theorem [TJ] in earnest. We continue to allow all implied constants to depend on e 
and y. 

It suffices by a limiting argument (using Lemma 1221) to establish the claim under 
the assumption that the distribution of M is continuous; our arguments will not 
use any quantitative estimates on this continuity. 

The strategy is to compare s with the Marchenko-Pastur Stieltjes transform 

SMP.y{z) := [ pMP,y{x) — dx . 

A routine application of ([1]) and the Cauchy integral formula yields the explicit 
formula 



(24) s M p,y(z) = 



2yz 



where we use the branch of y/(y + z — l) 2 — 4yz with cut at [a, b] that is asymptotic 
to y — 2+1 as z — > oo. To put it another way, for z in the upper half-plane, SMP, y {z) 
is the unique solution to the equation 

( 25 ) s MP,y = : 7— ]~\ 

y + z-l + yzsMP, y (z) 
with ImsMP, y (z) > 0. (Details of these computations can also be found in [2].) 

We have the following standard relation between convergence of Stieltjes transform 
and convergence of the counting function: 

Lemma 29 (Control of Stieltjes transform implies control on ESD). Let 1/10 > 
V > I/"-, and L,e,S > 0. Suppose that one has the bound 

(26) \s M p,y(z) - s(z)\ <S 
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with (uniformly) overwhelming probability for each z with |Re(z)| < L and lm(z) > 
rj. Then for any interval I in [a + e, b — e] with \I\ > ma,x(2rj, U log y), one has 

\Ni-n J pMP,y{x) dx\ < Sn\I\ 
with overwhelming probability. 



Proof. This follows from [J5J Lemma 64] ; strictly speaking, that lemma was phrased 
for the semi-circular distribution rather than the Marchenko-Pastur distribution, 
but an inspection of the proof shows the proof can be modified without difficulty. 
See also [23] and [9j Corollary 4.2] for closely related lemmas. □ 



In view of this lemma, we see that to show TheoremQlJl it suffices to show that for 
each complex number z with a + e/2 < Re(z) < b — e/2 and Im(z) > ry := — 1( ^ s — -, 
one has 

(27) s(z) - s M p, y {z) = o(l) 
with (uniformly) overwhelming probability. 

For this, we return to the formula (fT8|k inserting the identities (l2"TT) . ((22|) one has 

(28) ^''l^i^n 

where 

= ^ 1 A J -(M fc ) \X^ Uj (M k )\ 2 

Suppose we condition M k (and thus Wk) to be fixed; the entries of X k remain 
independent with mean zero and variance 1, and thus (since the Uj are unit vectors) 

E(Y t |M t , = £^> 1 



71 \j(W k )-Z 

V — 1 / , NN 

= ~ (l + zs k (z)) 

n 

where 

1 1 

Sk{Z) := P~l^XdW k )-z 
is the Stieltjes transform of W k . 

From the Cauchy interlacing law fLemma l2"Tj) we see that the difference 

— I 1 p 1 p_1 1 

s{z) ~ P — Skiz) = - ^ 
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is bounded in magnitude by O(^) times the total variation of the function A n- 
on [0, +00), which is 0(|). Thus 

P —± Sk { z )= s { z)+ 0{-) 
P PV 

and thus 

, . E(y fc |M fe ) = ^— ^ + -zs(z) + 0( — ) 

(29) n rc v m7 

= y + o(l) + (tf + o(l))z«(js) 
since p/n = y + o(l) and I/77 = o(n). 

We will shortly show a similar bound for Yfc itself: 

Lemma 30 (Concentration of Y&). for eac/i 1 < k < p, one has Yk = y + o(l) + 
(y + o(l))zs(z) with overwhelming probability (uniformly in k and I). 



Meanwhile, we have 



6cfc = - \\Xk 

n 



2 



and hence by Lemma 1271 £kk = 1 + o(l) with overwhelming probability (again 
uniformly in k and /). Inserting these bounds into (|28p. one obtains 

s{z) = 



pf^ i l-z-(y + o(l)) - (y + o(l))zs(z) 

with overwhelming probability; thus s(z) "almost solves" (|25p in some sense. From 
the quadratic formula, the two solutions of (|25p are SMP,y(z) and — 2/ " l ^ z z ~ 1 — 
sjup^Jz). One concludes that with overwhelming probability, one has either 

(30) s{z) = s MP . y (z) + o(l) 



or 



(31) s{z) = - V + Z l +o(l) 

yz 

or 

(32) s(z) = - y + Z ~ 1 - SM p !a (z) + o(l) 

(with the convention that 1 = 1 when y = 1). By using a n~ 100 -net (say) 
of possible z's and using the union bound (and the fact that s(z) has a Lipschitz 
constant of at most 0(n 10 ) (say) in the region of interest) we may assume that the 
above trichotomy holds for all z with a+e/2 < Re(z) < b—e/2 and rj < lm(z) < n 10 
(say). 

When Im(z) = n 10 , then s(z), SMP,y{z) are both o(l), and so (|30]) holds in this 
case. By continuity, we thus claim that either (|3"0]) holds for all z in the domain of 
interest, or there exists a z such that (l30l) as well as one of (I3T1) or (1321) both hold. 



From (|25p one has 



, , ,y + z — 1 , 1 

Sj\/P,y(z)( h SMP, 9 Z = 

yz yz 
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which implies that the separation between SMP.y(z) from — — is bounded from 
below, which implies that (|30)) . (|3Tj) cannot both hold (for n large enough). Simi- 
larly, from (PHf we see that 



y + * - 1 . / \ \/(y + ^- 1) 2 -4yz 
h 2smp,«(^J = ; 

yz yz 

since (y + z — l) 2 — 4yz has zeroes only when z = a, fe, and z is bounded away from 
these singularities, we see also that (|50j) . (p?2"]) cannot both hold (for n large enough). 
Thus the continuity argument shows that (|30p holds with uniformly overwhelming 
probability for all z in the region of interest for n large enough, which gives ([27]) 
and thus Theorem ITT)1 



6. Proof of Theorem [16] and Theorem [17] 



We first prove Theorem [16] The arguments follow those in [48] , 

We begin by observing from Markov's inequality and the union bound that one has 
ICy li iCyl < n 10 / Co (say) for all i,j with probability 0(n~ s ). Thus, by truncation 
(and adjusting the moments appropriately, using Lemma [25] to absorb the error), 
one may assume without loss of generality that 

(33) |Cd,IC--|<™ 10/Co 

almost surely for all i,j. Next, by a further approximation argument we may 
assume that the distribution of M, M' is continuous. This is a purely qualitative 
assumption, to ensure that the singular values are almost surely simple; our bounds 
will not depend on any quantitative measure on the continuity, and so the general 
case then follows by a limiting argument using Lemma 1221 

The key technical step is the following theorem, whose proof is delayed to the next 
section. 

Theorem 31 (Truncated Four Moment Theorem). For sufficiently small Co > 
and sufficiently large Cq > the following holds for every < e < 1 and k > 1. 
Let M = (Cij)i<i<pA<j<n an d M' — (Cij)i<i<p,i<j<'n ^ e matrix ensembles obeying 
condition CI for some Co, as well as (|33|) . Assume that p/n —¥ y for some < 
y < 1, and that Qj and match to order 4. 

Let G : M fc x — > K 5 e a smooth function obeying the derivative bounds 

(34) \V"G(x 1 ,...,x k ,q 1 ,...,q k )\ < n c ° 

for all < j < 5 and Xi, . . . , x k € R, q\, ■ ■ . , q k 6R, and such that G is supported 
on the region q±, . . . ,q k < n c ° , and the gradient V is in all 2k variables. 

Then for any ep < i\ < ii ■ ■ ■ < i k < (1— s)p, and for n sufficiently large depending 
on e, k, cq we have 

(35) |E(G(^a 4l (M), . . . , ^Fia ik (M), Q n (M), Q lk (M))) 
-E(G(V^<r il (Af'), • ■ ■ , yfan h (MO, Q h (MO, ...,Q ik (Af')))| < n- c «. 
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If&j, dj match to order 3, then the conclusion still holds as long as one strengthens 
(|33]l to 

(36) \V j G(x u ...,x k , qi ,...,q k )\<n-i<* 
for some c\ > 0, if cq is sufficiently small depending on c\. 



Given a p x n matrix M we form the augmented matrix M defined in (|9]) , whose 
eigenvalues are ±ai(M), . . . , ±a p (M), together with the eigenvalue with multi- 
plicity n — p (Hp < n). For each 1 < i < p, we introduce (in analogy with the 
arguments in [48] ) the quantities 

_ 1 / 1 n — p m—^ 1 . 

(The factor of i in Qi(M) is present to align the notation here with that in |48j . 
in which one dilated the matrix by y/n.) We set Qi(M) = oo if the singular value 
(J,: is repeated, but this event occurs with probability zero since we are assuming 
M to be continuously distributed. 

The gap property on M ensures an upper bound on <5, (M): 

Lemma 32. If M satisfies the gap property, then for any Co > (independent of 
n), and any ep < i < (1 — e)p, one has Qj(M) < n c ° with high probability. 



Proof. Observe the upper bound 

(37) giM <4 a^g + ^ 



From Corollary[2niwe see that with overwhelming probability, ai(M) 2 /n is bounded 
away from zero, and so ^-pjfp = 0(l/n). To bound the other term in (|3"T|) . one 
repeats the proof of [48j Lemma 49] . □ 



By applying a truncation argument exactly as in |481 Section 3.3], one can now 
remove the hypothesis in Theorem [31] that G is supported in the region q\, . . . , q k < 
n c ° . In particular, one can now handle the case when G is independent of q\, . . . , q k ] 
and Theorem [16] follows after making the change of variables A = ^a 2 and using 
the chain rule (and Corollary ED]) . 

Next, we prove Theorem [TT1 assuming both Theorems [31] and [15] The main 
observation here is the following lemma. 

Lemma 33 (Matching lemma). Let £ be a complex random variable with mean 
zero, unit variance, and third moment bounded by some constant a. Then there 
exists a complex random variable £ with support bounded by the ball of radius O a (l) 
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centered at the origin (and in particular, obeying the exponential decay hypothesis 
uniformly in C, for fixed a ) which matches £ to third order. 

Proof. In order for Q to match £ to third order, it suffices that Q have mean zero, 
variance 1, and that EC 3 = E( 3 and EC 2 C = EC 3 C- 

Accordingly, let f2 C C 2 be the set of pairs (E( 3 ,~E( 2 () where C ranges over 
complex random variables with mean zero, variance one, and compact support. 
Clearly f2 is convex. It is also invariant under the symmetry (z,w) h-> (e 3lS z, e l0 w) 
for any phase 8. Thus, if (z, w) € O, then (— z, e"/ 3 w) g f2, and hence by convexity 
(0, ^e"/ 6 w) € Q, and hence by convexity and rotation invariance (0,it/) € 51 

whenever \w'\ < ^w. Since (2,10) and (0,—^w) both lie in fl : by convexity 
(cz,0) lies in it also for some absolute constant c > 0, and so again by convexity 
and rotation invariance (z',0) € ft whenever \z'\ < cz. One last application of 
convexity then gives (z'/2,u>'/2) 6 fi whenever \z'\ < cz and \w'\ < ^w. 

It is easy to construct complex random variables with mean zero, variance one, 
compact support, and arbitrarily large third moment. Since the third moment 
is comparable to \z\ + \w\, we thus conclude that ft contains all of C 2 , i.e. every 
complex random variable with finite third moment with mean zero and unit variance 
can be matched to third order by a variable of compact support. An inspection of 
the argument shows that if the third moment is bounded by a then the support 
can also be bounded by O a (l). □ 

Now consider a random matrix M as in Theorem [17] with atom variables Qj . By 
the above lemma, for each i, j, we can find • which satisfies the exponential decay 
hypothesis and match Qj to third order. Let rj(q) be a smooth cutoff to the region 
q < n Ca for some cq > independent of n, and let ep < i < (1 — e)p. By Theorem 
[T5l the matrix M' formed by the satisfies the gap property. By Lemma 1521 

E ?? (Q l (Af')) = l-0(n- Cl ) 
for some c\ > independent of n, so by Theorem ED one has 

E V {Qi(M)) = l-0(rr C2 ) 
for some C2 > independent of n. We conclude that M also obeys the gap property. 

The next two sections are devoted to the proofs of Theorem [31] and Theorem [T5| 
respectively. 

Remark 34. The above trick to remove the exponential decay hypothesis for The- 
orem [15] also works to remove the same hypothesis in |481 Theorem 19]. The point 
is that in the analogue of Theorem l3T1 in that paper (implicit in [48] Section 3.3]), 
the exponential decay hypothesis is not used anywhere in the argument; only a 
uniformly bounded Co moment for Co large enough is required, as is the case here. 
Because of this, one can replace all the exponential decay hypotheses in the results 
of [48] [49] by a hypothesis of bounded Co moment; we omit the details. 



UNIVERSALITY FOR COVARIANCE MATRICES 



23 



7. The proof of Theorem [31] 



It remains to prove Theorem [3T] By telescoping series, it suffices to establish a 
bound 

|E(G(vW n (M), . . . , yfar ih (M),Q n (M), ...,Q ih (M))) 
(38) - E(G(v^7 n (M 1 ), ^a lk (M'),Q n (Af), ■ • ■ , Qi h (M')))| 

under the assumption that the coefficients Qj , ^ ■ of M and M' are identical except 
in one entry, say the qr entry for some 1 < q < p and 1 < r < n, since the claim then 
follows by interchanging each of the pn = 0(n 2 ) entries of M into M' separately. 

Write M(z) for the matrix M (or M') with the qr entry replaced by z. We apply 
the following proposition, which follows from a lengthy argument in |48j : 

Proposition 35 (Replacement given a good configuration). Let the notation and 
assumptions be as in Theorem \31\ There exists a positive constant C± (independent 
of k) such that the following holds. Let £\ > 0. We condition (i.e. freeze) all the 
entries of M(z) to be constant, except for the qr entry, which is z. We assume that 
for every 1 < j < k and every \z\ < n 1 / 2+£l whose real and imaginary parts are 
multiples of n~ Cl , we have 



• (Singular value separation) For any 1 < i < n with \i — iA > n El , we have 

(39) |Vna ( (M(js)) - v^(M(z))| > n - £l \i - i,\. 
Also, we assume 

(40) ^<r tj (A(z)) > n-^n. 

• (Derealization atij) Ifui j (M(z)) € C™, Vi j (M(z)) £ C p are unit right and 
left singular vectors of M(z), then 

(41) \e* q v ij {M{z)%Ku ij {M{z))\ <n'^K 

• For every a > 

(42) ||P l3 , Q (M(z))e 9 ||,||^ iQ (M(z))e r || < 2^n^\ 

whenever Pi } , a (resp. P[. a ) is the orthogonal projection to the span of 
right singular vectors Ui{M{z)) (resp. left singular vectors Vi(M(z))) cor- 
responding to singular values <Ji(A(z)) with 2 a < \i — ij\ < 2 a+1 . 



We say that M(0), e g , e r are a good configuration for i\, . . . , ik if the above proper- 
ties hold. Assuming this good configuration, then we have Q38p if Qj and £y match 
to order A, or if they match to order 3 and ()36[) holds. 



Proof. This follows by applying [48 , Proposition 46] to the p + n x p + n Hermitian 
matrix A(z) :— y^Mfz), where M(z) is the augmented matrix of M(z), defined 
in ([S]). Note that the eigenvalues of A(z) are ±y/nai(M(z)), . . . , ±y/na p (M(z)) 

AM{z)) \ 



and 0, and that the eigenvalues are given (up to unit phases) by 



±u j (M(z)) y 
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Note also that the analogue of (|42|) in [48j Proposition 46] is trivially true if 2 a is 
comparable to n, so one can restrict attention to the regime 2 Q = o(n). □ 

In view of the above proposition, we see that to conclude the proof of Theorem I3T1 
(and thus Theorem [TB"]) it suffices to show that for any e\ > 0, that M(0), e g , e r are a 
good configuration for i\, . . . , ik with overwhelming probability, if Co is sufficiently 
large depending on £\ (cf. [48, Proposition 48]). 

Our main tools for this are Theorem [T5] and Theorem [TO] Actually, we need a 
slight variant: 

Proposition 36. The conclusions of Theorem\T^and Theorem \19\ continue to hold 
if one replaces the qr entry of M by a deterministic number z — 0(n 1 / 2+0( - 1 / c ' -'). 

This is proven exactly as in [35] Corollary 63] and is omitted. 

We return to the task of establishing a good configuration with overwhelming prob- 
ability. By the union bound we may fix 1 < j < fc, and also fix the \z\ < n 1 / 2 + e ^ 
whose real and imaginary parts are multiples of n~ Cl . By the union bound again 
and Proposition[36l the eigenvalue separation condition ([39]) holds with overwhelm- 
ing probability for every 1 < i < n with \i — j\ > n ei (if Co is sufficiently large), as 
does (|4Tj) . A similar argument using Pythagoras' theorem and Corollary [20l gives 
(|42p with overwhelming probability (noting as before that we may restrict atten- 
tion to the regime 2 a = o(n)). Corollary l20l also gives (|40|) with overwhelming 
probability. This gives the claim, and Theorem 1161 follows. 



8. Proof of Theorem [T5l 



We now prove Theorem [T5l closely following the analogous arguments in [48] . 
Using the exponential decay condition, we may truncate the Qj (and renormalise 
moments, using Lemma [2"2"| to assume that 

(43) |C«| <log°Wn 

almost surely. By a limiting argument we may assume that M has a continuous 
distribution, so that the singular values are almost surely simple. 

We write io instead of i, po instead of p, and write A^o := po + n. As in [48], the 
strategy is to propagate a narrow gap for M — M Po . n backwards in the p variable, 
until one can use Theorem [T9l to show that the gap occurs with small probability. 

More precisely, for any 1 < i — I < i < p < po, we let M Pi „ be the p x n matrix 
formed using the first p rows of M POt „, and we define (following [48]) the regularized 
gap 

.... . c VNocn + (M p , n ) - ViVo(7i_ (M p , n ) 

44) Qilv'-= mf ^ r: — ■ , 

" P l<i-<i-Ki<i+< P min(i + -i_,log Cl N y°s - 9 N 
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where Ci > 1 is a large constant to be chosen later. It will suffice to show that 

(45) g io ,i, Po < n~ c °. 

The main tool for this is 

Lemma 37 (Backwards propagation of gap). Suppose that po/2 < p < po and 
I < ep/10 is such that 

(46) 9i ,L P +i < & 

for some < 5 < 1 (which can depend on n), and that 

(47) SW+i,P ^ 2 m 9i ,i,P+i 
for some m > with 

(48) 2 m < <T 1/2 . 

LetX p+1 be thep+l th row of M POtU , and let Ui(M P) „), . . . ,u p (M p>n ) be an orthonor- 
mal system of right singular vectors of M Pj „ associated to o~i(M Ptn ), . . . , o~ p (M p ^ n ). 
Then one of the following statements hold: 

(i) (Macroscopic spectral concentration) There exists l<i-<i + <p+l 
ith i + — i_ > log Cl ^ 2 n such that \y/nai + (M p+ i in ) — y/nai_(M p+ i tn )\ < 



wit 



5 1 / 4 exp(log - 95 n)(i+-i_). 
(ii) (Small inner products) There exists ep/2 < i- < i — I < i < i+ < 
(1 — e/2)p with i+ — i- < log'" 1 '' 2 n such that 



(49) J2 i*;wm p ,„)i 2 < 



i+ - 1. 



2™/ 2 log aol n' 

(iii) (Large singular value) For some 1 < i < p + 1 one has 

UIM m ^ V / ^exp(-log°- 95 n) 

(iv) (Large inner product in bulk) There exists ep/10 < i < (1 — e/10)p such 
that 

\x; + mm p ^ > exp( -;y 2 a96n) - 



(v) (Large row) We have 

ll^+if > 



nexp(- log 96 n) 
^T72 ■ 



(vi) (Large inner product near i ) There exists ep/10 < i < (1 — e/10)p with 
\i — io\ < log Cl n such that 

|*; + i^(M p , n )| 2 >2 m / 2 nlog a V 
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Proof. This follows by applying [48] Lemma 51] to the p + n + 1 xp+n|l 
Hcrmitian matrix 



| ._ ,77/ M p+l.n 

A p+n+1 .-Vn[ Mp+in *> Q 



which after removing the bottom row and rightmost column (which is X p +i, plus 
p + 1 zeroes) yields the p + n x p + n Hermitian matrix 



M p , n 



which has eigenvalues ±y/nai(M PtTl ) , . . . , ±^/no~ p (M p ^ n ) and 0, and an orthonor- 

mal eigenbasis that includes the vectors ( J f° r 1 < i < f>- (The "large 

coefficient" event in [48] Lemma 51(iii)] cannot occur here, as A p+n+ i has zero 
diagonal.) □ 

By repeating the arguments in 48, Section 3.5] almost verbatim, it then suffices 
to show that 

Proposition 38 (Bad events are rare). Suppose that po/2 < p < po and I < ep/10, 
and set 5 := Uq K for some sufficiently small fixed k > 0. Then: 

(a) The events (i), (Hi), (iv), (v) in Lemma \37\ all fail with high probability. 

(b) There is a constant C such that all the coefficients of the right singular 
vectors Uj(M p _ n ) for ep/2 < j < (1 — e/2)p are of magnitude at most 
rt~ 1//2 log C n with overwhelming probability. Conditioning M p „ to be a 
matrix with this property, the events (ii) and (vi) occur with a conditional 
probability of at most 2~ nm + n~ K . 

(c) Furthermore, there is a constant C2 (depending on C ,n,C\) such that if 
I > Ci and M p _ n is conditioned as in (b), then (ii) and (vi) in fact occur 
with a conditional probability of at most 2~ Km log ~ 2Cl n + n~ K . 

But Proposition [38] can be proven by repeating the proof of [48j Proposition 53] 
with only cosmetic changes, the only significant difference being that Theorem [T9l 
and Theorem [18] are applied instead of [48] Theorem 60] and [48] Proposition 62] 
respectively. 
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